Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(swing-store): budget-limited deletion of snapshot and transcripts #9478

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

warner
Copy link
Member

@warner warner commented Jun 10, 2024

Both snapStore.deleteVatSnapshots() and
transcriptStore.deleteVatTranscripts() now take a numeric budget= argument, which will limit the number of snapshots or transcript spans deleted in each call. Both return a { done, cleanups } record so the caller knows when to stop calling.

This enables the slow deletion of large vats (lots of transcript spans or snapshots), a small number of items at a time. Recommended budget is 5, which (given SwingSet's snapInterval=200 default) will cause the deletion of 1000 rows from the transcriptItems table each call, which shouldn't take more than 100ms.

Without this, the kernel's attempt to slowly delete a terminated vat would succeed in slowly draining the kvStore, but would trigger a gigantic SQL transaction at the end, as it deleted every transcript item in the vat's history. The worst-case example I found would be the mainnet chain's v43-walletFactory, which (as of apr-2024) has 8.2M transcript items in 40k spans. A fast machine takes two seconds just to count all the items, and deletion took 22 minutes, with a swingstore.wal file that peaked at 27 GiB. This would cause an enormous chain stall at some surprising point in time weeks or months after the vat was first terminated. In addition, both the transcript spans and the snapshot records are shadowed into IAVL (via export-data) for integrity, and deleting 40k+40k=80k IAVL records in a single block might cause some significant churn too.

The kernel should call transcriptStore.stopUsingTranscript() and snapStore.stopUsingLastSnapshot() as soon as the vat is terminated, to make exports smaller right away (by omitting all transcript/snapshot artifacts for the given vat, even before those DB rows or their export-data records have been deleted).

New swing-store documentation was added.

refs #8928

@warner warner added SwingSet package: SwingSet swing-store labels Jun 10, 2024
@warner warner requested a review from mhofman June 10, 2024 17:36
Copy link

cloudflare-workers-and-pages bot commented Jun 10, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: c43bf63
Status: ✅  Deploy successful!
Preview URL: https://00bb9404.agoric-sdk.pages.dev
Branch Preview URL: https://warner-8980-swingstore-delet.agoric-sdk.pages.dev

View logs

@warner
Copy link
Member Author

warner commented Jun 10, 2024

Security Considerations

None. Swingstore exports during slow deletion continue to be coherent (export-data records remain until deleted, artifacts stop right away), unit tests demonstrate the importability of these exports. There were no changes to the import-side code, including the hash validation pathways.

Scaling Considerations

This change enables better scaling, by allowing the kernel to delete terminated vats slowly, and avoid overloading validator nodes with tons of DB or IAVL work in a single block.

Documentation Considerations

This PR introduces extensive new swing-store documentation, including coverage of the new deletion-budget feature.

Testing Considerations

The upcoming swingset changes that take advantage of this API should have their own unit tests, which exercise slow deletion.

We'll also need to perform integration/performance tests with real mainnet chain data, and make sure we're satisfied with our choices for deletion rate / budget. We must choose values that delete the old price-feed vats in a timely fashion (weeks or months), and does not overload validators with the constant "background" cleanup work.

Upgrade Considerations

None: the old (unlimited) API continues to work the same as before. It only changes behavior if you provide a new positional budget argument to the deletion calls.

@warner warner force-pushed the warner/8980-boyd-scheduler branch from 3200c52 to a698f7b Compare June 12, 2024 02:15
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from e2f75f4 to 472dd22 Compare June 12, 2024 02:15
@warner warner force-pushed the warner/8980-boyd-scheduler branch from a698f7b to 2e3641a Compare June 13, 2024 18:12
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from 472dd22 to b43f4da Compare June 13, 2024 18:12
@warner warner force-pushed the warner/8980-boyd-scheduler branch from 2e3641a to fdecbe5 Compare June 14, 2024 22:52
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch 2 times, most recently from 0126e7d to 85c31ab Compare June 14, 2024 23:07
@warner warner force-pushed the warner/8980-boyd-scheduler branch from fdecbe5 to 796a2d3 Compare June 14, 2024 23:07
@warner warner force-pushed the warner/8980-boyd-scheduler branch from 796a2d3 to 2611cb9 Compare June 24, 2024 16:29
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch 2 times, most recently from 0868981 to e7be6b3 Compare June 26, 2024 19:20
@warner warner force-pushed the warner/8980-boyd-scheduler branch 2 times, most recently from c05be76 to 70300ac Compare June 27, 2024 21:39
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from e7be6b3 to 1914088 Compare June 27, 2024 21:39
@warner warner force-pushed the warner/8980-boyd-scheduler branch from 70300ac to e79c29c Compare July 2, 2024 06:03
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from 1914088 to 1ba5547 Compare July 2, 2024 06:03
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from 1ba5547 to afe1c29 Compare July 10, 2024 17:05
@warner warner force-pushed the warner/8980-boyd-scheduler branch from e79c29c to 691b43a Compare July 10, 2024 17:05
@warner warner requested a review from gibson042 July 10, 2024 17:11
@warner warner force-pushed the warner/8980-boyd-scheduler branch from 691b43a to ede07c7 Compare July 11, 2024 04:38
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from afe1c29 to 49ba122 Compare July 11, 2024 04:38
@warner warner force-pushed the warner/8980-boyd-scheduler branch from ede07c7 to a1b9efe Compare July 11, 2024 23:51
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from 49ba122 to 77c46d1 Compare July 11, 2024 23:51
Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just doc changes for this first-round review; next I'll look at the code. But by the way, 👍👍 for making the former so thorough.

packages/swing-store/docs/bundlestore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/bundlestore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/bundlestore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/bundlestore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/bundlestore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/swingstore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/swingstore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/swingstore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/swingstore.md Outdated Show resolved Hide resolved
packages/swing-store/docs/swingstore.md Outdated Show resolved Hide resolved
@gibson042 gibson042 self-requested a review July 17, 2024 23:10
Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, great testing. No blocking concerns.

packages/swing-store/src/snapStore.js Outdated Show resolved Hide resolved
@@ -39,7 +39,7 @@ import { buffer } from './util.js';
* loadSnapshot: (vatID: string) => AsyncIterableIterator<Uint8Array>,
* saveSnapshot: (vatID: string, snapPos: number, snapshotStream: AsyncIterable<Uint8Array>) => Promise<SnapshotResult>,
* deleteAllUnusedSnapshots: () => void,
* deleteVatSnapshots: (vatID: string) => void,
* deleteVatSnapshots: (vatID: string, budget?: number) => { done: boolean, cleanups: number },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: rename "cleanups" for clarity/comprehensibility.

Suggested change
* deleteVatSnapshots: (vatID: string, budget?: number) => { done: boolean, cleanups: number },
* deleteVatSnapshots: (vatID: string, budget?: number) => { done: boolean, count: number },

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it.. that'll require matching in the kernel, in the next PR in this series

packages/swing-store/src/snapStore.js Show resolved Hide resolved
packages/swing-store/src/snapStore.js Outdated Show resolved Hide resolved
packages/swing-store/src/snapStore.js Outdated Show resolved Hide resolved
packages/swing-store/src/transcriptStore.js Outdated Show resolved Hide resolved
packages/swing-store/src/transcriptStore.js Outdated Show resolved Hide resolved
packages/swing-store/src/transcriptStore.js Outdated Show resolved Hide resolved
packages/swing-store/src/transcriptStore.js Show resolved Hide resolved
packages/swing-store/test/deletion.test.js Show resolved Hide resolved
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from c862b45 to 227dafa Compare August 10, 2024 23:57
@warner warner force-pushed the warner/8980-boyd-scheduler branch 2 times, most recently from 2fef729 to 1f7943e Compare August 11, 2024 16:38
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from 227dafa to 486dfbb Compare August 11, 2024 16:38
@warner warner force-pushed the warner/8980-boyd-scheduler branch from 1f7943e to 6547c83 Compare August 12, 2024 21:46
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from 486dfbb to f6787e8 Compare August 12, 2024 22:01
Base automatically changed from warner/8980-boyd-scheduler to master August 12, 2024 22:42
@warner warner added the automerge:rebase Automatically rebase updates, then merge label Aug 12, 2024
Both `snapStore.deleteVatSnapshots()` and
`transcriptStore.deleteVatTranscripts()` now take a numeric `budget=`
argument, which will limit the number of snapshots or transcript spans
deleted in each call. Both return a `{ done, cleanups }` record so the
caller knows when to stop calling.

This enables the slow deletion of large vats (lots of transcript spans
or snapshots), a small number of items at a time. Recommended budget
is 5, which (given SwingSet's `snapInterval=200` default) will cause
the deletion of 1000 rows from the `transcriptItems` table each call,
which shouldn't take more than 100ms.

Without this, the kernel's attempt to slowly delete a terminated vat
would succeed in slowly draining the kvStore, but would trigger a
gigantic SQL transaction at the end, as it deleted every transcript
item in the vat's history. The worst-case example I found would be the
mainnet chain's v43-walletFactory, which (as of apr-2024) has 8.2M
transcript items in 40k spans. A fast machine takes two seconds just
to count all the items, and deletion took 22 *minutes*, with a
`swingstore.wal` file that peaked at 27 GiB. This would cause an
enormous chain stall at some surprising point in time weeks or months
after the vat was first terminated. In addition, both the transcript
spans and the snapshot records are shadowed into IAVL (via
`export-data`) for integrity, and deleting 40k+40k=80k IAVL records in
a single block might cause some significant churn too.

The kernel should call `transcriptStore.stopUsingTranscript()` and
`snapStore.stopUsingLastSnapshot()` as soon as the vat is terminated,
to make exports smaller right away (by omitting all
transcript/snapshot artifacts for the given vat, even before those DB
rows or their export-data records have been deleted).

New swing-store documentation was added.

refs #8928

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
@warner warner force-pushed the warner/8980-swingstore-deletion-budget branch from f6787e8 to c43bf63 Compare August 12, 2024 22:54
@mergify mergify bot merged commit 0eaf7cc into master Aug 12, 2024
80 checks passed
@mergify mergify bot deleted the warner/8980-swingstore-deletion-budget branch August 12, 2024 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge swing-store SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants